Identification of Human Gene Structure Using Linear Discriminant Functions and Dynamic Programming
نویسندگان
چکیده
Development of advanced technique to identify gene structure is one of the main challenges of the Human Genome Project. Discriminant analysis was applied to the construction of recognition functions for various components of gene structure. Linear discriminant functions for splice sites, 5'-coding, internal exon, and 3'-coding region recognition have been developed. A gene structure prediction system FGENE has been developed based on the exon recognition functions. We compute a graph of mutual compatibility of different exons and present a gene structure models as paths of this directed acyclic graph. For an optimal model selection we apply a variant of dynamic programming algorithm to search for the path in the graph with the maximal value of the corresponding discriminant functions. Prediction by FGENE for 185 complete human gene sequences has 81% exact exon recognition accuracy and 91% accuracy at the level of individual exon nucleotides with the correlation coefficient (C) equals 0.90. Testing FGENE on 35 genes not used in the development of discriminant functions shows 71% accuracy of exact exon prediction and 89% at the nucleotide level (C = 0.86). FGENE compares very favorably with the other programs currently used to predict protein-coding regions. Analysis of uncharacterized human sequences based on our methods for splice site (HSPL, RNASPL), internal exons (HEXON), all type of exons (FEXH) and human (FGENEH) and bacterial (CDSB) gene structure prediction and recognition of human and bacterial sequences (HBR) (to test a library for E. coli contamination) is available through the University of Houston, Weizmann Institute of Science network server and a WWW page of the Human Genome Center at Baylor College of Medicine.
منابع مشابه
Estimating the Saturated Hydraulic Conductivity of Soil Using Gene Expression Programming Method and Comparing It with the Pedotransfer Functions
Saturated hydraulic conductivity of soil is an important physical property of soil that affects water movement in soil, Since the measurement of saturated hydraulic conductivity by direct methods in the field or in the laboratory is hard, time-consuming and costly, the indirect methods are being used.The aim of this study is to estimate the saturated hydraulic conductivity from other soil prope...
متن کاملInvestigation for an Approach to Optimise the Structure of Human Force
Abstract This paper proposes an approach to find an optimum structure for educational levels of human forces. To this end, a Linear Programming (LP) Model integrated with a Social Accounting Matrix (SAM) was employed. The integrated model was employed using the SAM of Golestan Province of Iran. It was demonstrated that when unemployment is the result of inconsistency between supply and demand...
متن کاملA New Approach to Solve Fully Fuzzy Linear Programming with Trapezoidal Numbers Using Conversion Functions
Recently, fuzzy linear programming problems have been considered by many. In the literature of fuzzy linear programming several models are offered and therefore some various methods have been suggested to solve these problems. One of the most important of these problems that recently has been considered; are Fully Fuzzy Linear Programming (FFLP), which all coefficients and variables of the prob...
متن کاملGene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method
Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...
متن کاملMulti-Group Classification Using Interval Linea rProgramming
Among various statistical and data mining discriminant analysis proposed so far for group classification, linear programming discriminant analysis has recently attracted the researchers’ interest. This study evaluates multi-group discriminant linear programming (MDLP) for classification problems against well-known methods such as neural networks and support vector machine. MDLP is less compli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings. International Conference on Intelligent Systems for Molecular Biology
دوره 3 شماره
صفحات -
تاریخ انتشار 1995